Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce EntityRetrievingClosestReferencedEntityIdLookup #195

Merged
merged 1 commit into from
Apr 24, 2018

Conversation

mariushoch
Copy link
Member

@mariushoch mariushoch force-pushed the EntityRetrievingClosestReferencedEntityIdLookup branch from 4200f7c to a0537ac Compare April 14, 2018 13:58
* @param int $expectedNumberOfGetEntityCalls
* @return EntityLookup
*/
private function maskEntityLookup( EntityLookup $entityLookup, $expectedNumberOfGetEntityCalls ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is mask a typo? If not, I am confused by this verb.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was not, but I see that it's not accurate… so changed to restrictEntityLookup.


/**
* @param EntityLookup $entityLookup
* @param int $expectedNumberOfGetEntityCalls
Copy link
Contributor

@JeroenDeDauw JeroenDeDauw Apr 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have this argument an an if-else? Looks like you can just have two simple methods

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, can even be one simpler method… changed to just use phpunit mocks all the time and not sometimes rely on RestrictedEntityLookup.

* @param int $expectedPrefetches
* @return EntityPrefetcher
*/
private function getEntityPrefetcher( $expectedPrefetches ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the fundraising codebase we use the convention to name such methods in tests that construct new stuff newStuff and not getStuff. I think that is clearer, since then you know you get a new instance every time and can happily cause changes in its state without other stuff blowing up

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also a fan of the new… naming convention.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, changed.

Copy link
Member

@lucaswerkmeister lucaswerkmeister left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in general, not tested manually yet (but I guess that would have to wait until a Lua binding is available in some other change?)

return $result;
}
}
// Remove already visitted entities
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: visited

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done :)

}

/**
* @return InMemoryEntityLookup
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does anything else depend on this specifically returning an InMemoryEntityLookup and not any EntityLookup?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, changed the return type hint.

* @param int $expectedPrefetches
* @return EntityPrefetcher
*/
private function getEntityPrefetcher( $expectedPrefetches ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also a fan of the new… naming convention.

private $entityPrefetcher;

/**
* @var int Maximum search depth: Maximum number of intermediate entities to search through.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, this is confusing. A "maximum depth" is not the same as a "maximum number of entities to search through". This search spans a tree. If the tree is very flat, the search depth might be something like 5, while a few thousand entities have been traversed already. Please count the entities, and not the depth. We run into issues with this in the QualityConstrains codebase already. @lucaswerkmeister might be able to provide an example.

Further down in the constructor I see both a depth and some maximum number. It's hard to tell what they do. I think a depth is not needed.

If you think having both is helpful, I would not make the depth throw an exception. An exception is only thrown if the maximum number of entities is reached. My reasoning for this idea is: Think of a tree with two branches, the first is extremely deep, while the second is only a few entities deep. Both together exceed the "maximum number of entities" setting. If the depth-check throws an exception, nothing would be found. But if the depth-check only makes the code stop traversing a particular branch, the second, smaller branch would be found.

I hope this makes sense.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope I correctly understood your comment, if not, please correct me!

Please note that this is not doing a depth-first search, but a breadth-first search, thus there's no need to make "the code stop traversing a particular branch". In this context I think it might be useful for a caller to say that they only want us to look for paths with a certain maximum length.
The entity visit limitation is just an added safety, which I don't imagine firing very often, but only in specific cases where one or more visited entities references many other entities.

I agree that the documentation for this might be improvable. I'll see what I can do there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid I still don't know what this integer represents. The documentation uses the words "depth" (as in how deep a tree is) and "number" (as in the number of visited leaves) in the same sentence. For me these are two very different concepts. Which one does this integer represent?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that this represents the maximum depth of the tree. You can also think of the depth as the number of entities above the current leaf (i. e., intermediate entities) at any point. Does that make sense?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also think of the depth as the number of entities above the current leaf […]

"Above the current leave" is an other tree. What "number of entities" from this tree is meant? A horizontal or a vertical slice?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maximum number of intermediate entities to search through, this means given we have a path Q1 -> … -> Q42, this is the number of entities represented by the .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So "maxDepth" is indeed the maximum depth of that particular tree. :-) Thanks!

return null;
}

$this->alreadyVisited = [];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't you keep this cache for the lifetime of the object?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because we want to maybe visit the same entities again, if another (or even the same) reference-relationship is queried.
We may have a caching decorator for this in the future, but that's not this classes business.

* @return StatementListProvider|null Null if not applicable.
*/
private function getEntity( EntityId $id, EntityId $fromId, PropertyId $propertyId, array $toIds ) {
$this->alreadyVisited[$id->getSerialization()] = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I miss a safety check here that starts screaming when this is called with an already visited entity. As far as I can see this is not required, as the caller makes sure this can't happen. But this is a little hidden in an array_diff, and I would feel better with an additional isset check. It's cheap.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a LogicException here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed this to actually trigger a warning now… throwing an exception is maybe a bit harsh.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trigger_error causes a global side effect. Without knowing the context here I can't say what is appropriate, though normally you either throw an exception and let the caller deal with it, which might be ignoring the error, escalating it, logging it, etc. Or, in cases where the execution in the local context should not be aborted, you log something or otherwise output info via some collaborator (which could also be an event thing or whatever).

@mariushoch mariushoch force-pushed the EntityRetrievingClosestReferencedEntityIdLookup branch from a0537ac to 1d0f5bd Compare April 17, 2018 22:42
@lucaswerkmeister
Copy link
Member

lucaswerkmeister commented Apr 20, 2018

One thing I just remembered – what’s the status of usage tracking in this service? We discussed this in #193, but I don’t see anything related in this pull request.

@mariushoch
Copy link
Member Author

what’s the status of usage tracking in this service?

I discussed this with @lydiapintscher and we probably don't need usage tracking for now.

If we want usage tracking later on, we would need this service to track the specific patch chosen… or in this case, where the service even returns the shortest path, we would need to track all visited entities, if we wanted to keep this guarantee all the time.

@JonasKress JonasKress merged commit 5b49bb4 into master Apr 24, 2018
@JeroenDeDauw JeroenDeDauw deleted the EntityRetrievingClosestReferencedEntityIdLookup branch April 25, 2018 04:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants